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Abstract 

This study analyzes the Arizona policy of utilizing a single assessment of English 
proficiency to determine if students should be exited from the ELL program, which is 
ostensibly designed to make it possible for them to succeed in the mainstream classroom 
without any further language support. The study examines the predictive validity of this 
assessment instrument on ELLs’ performance on state required academic achievement 
tests at three grade levels. It finds that at subsequent grade levels after redesignation, the 
“one-test” AZELLA becomes less predictive of academic achievement, That is, the test 
over predicts student achievement, suggesting that many students may be under-served 
due to their scores the test. .This finding calls into question Arizona’s “one-test” 
procedure for redesignating ELLs to a non-service category. Given the large and 
increasing size of the ELL student population in Arizona, the current focus on testing and 
accountability, and the documented problems in current assessment practices, 
improvement in instruments and procedures is critical. These improvements are 
necessary at all phases of the assessment process, but as this study indicates, the present 
policy is likely denying services these student need and violating the rights of these 
students to an equal educational opportunity. 
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Introduction 



The Challenge of Assessing ELLs 

The increasing demand for evaluation, assessment, and accountability in education 
comes at a time when the fastest growing student population in Arizona is children 
whose home language is not English. This presents several challenges to practitioners 
and school systems generally when they lack familiarity with important concepts such as 
second language acquisition, acculturation, and the role of socioeconomic background as 
they relate to test development, administration, and interpretation. Because assessment is 
key in developing and implementing effective curricular and instructional services that 
are required to promote student learning, English language learner (EEE) children have 
the right to be assessed to determine their educational needs. Through individual 
assessments, teachers can personalize instruction, make adjustments to classroom 
activities, assign children to appropriate program placements, and have more informed 
communication with parents. They can also identify learning problems that may require 
additional outside assistance. And educational systems need to know how EEEs are 
performing in order to make proper adjustments to their programs and to effect necessary 
policy changes. Notwithstanding the increasing need, states have struggled to develop 
strong assessment programs with appropriate instruments for use with young EEEs. 

Although hundreds of languages are represented in schools in the Elnited States, 
Spanish is the most common; nationally almost 80% of all EEE students speak Spanish 
as their first language (Gandara & Rumberger, 2009); in Arizona the percentage is even 
higher - closer to 85% (xxx), and while some Spanish language tests exist, most lack the 
technical qualities required of high-quality assessment tools, or the specifications to 
serve as the accountability purposes of NCEB.. Additionally, there is a shortage of 
bilingual professionals with the skills necessary to evaluate these children, (Arias, 2009). 
The intent of this article is to describe the challenges inherent in assessing young English 
language learners, to review important principles associated with the development of 
such assessment, and to present an empirical analysis that indicates that Arizona’s 
current assessment instrument does not yield valid inferences about EEEs’ readiness to 
undertake English only instruction. 

Young English Language Learners: Who Are They? 

Several terms are used in the literature to describe children from diverse language 
backgrounds in the United States. A general term describing children whose native 
language is other than English, the mainstream societal language in the US, is language 
minority. This term is applied to non-native English speakers regardless of their current 
level of English proficiency. Other common terms are English language learner (EEE), 
English learner (EE), and limited English proficient (EEP). These terms are used 
interchangeably to refer to students whose native language is other than English, and 
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whose English proficiency is not yet developed to a point where they can profit fully 
from English instruction or communication. In this article, the term English language 
learner and its respective abbreviation is preferred as it places an emphasis on students’ 
learning and progress rather than their limitations. 

Young EELs (generally considered to be between 0-8 years) have been the 
fastest growing student population in the country over the past few decades, due 
primarily to increased rates in immigration. Currently, one in four school-age children in 
Arizona has a foreign-born parent (Capps et ah, 2005), and many of these children learn 
English as a second language, though not all. Overall, the population of children 
speaking a non-English native language in Arizona rose from 16 percent in 1979 to 27 
percent in 1999 (NCEEA, 2006) and the number of language minority students in K-12 
schools has recently been estimated to be over 120,000 (August, 2006). 

Assessing the development of EEEs demands an understanding of who these 
children are in terms of their linguistic and cognitive development, as well as the social 
and cultural contexts in which they are raised. The key distinguishing feature of these 
children is their non-English language background. In addition to linguistic background, 
other important attributes of EEL children include their ethnic, immigrant, and 
socioeconomic histories (Abedi, Hofstetter, & Lord, 2004; Capps et ah, 2005; Figueroa 
& Hernandez, 2000; Hernandez, 2006). Though diverse in their origins, ELL students, on 
average, are more likely than their native English-speaking peers to have an immigrant 
parent, to live in low-income families, and to be raised in cultural contexts that do not 
reflect mainstream norms in the US (Capps et ah, 2005; Hernandez, 2006). 

Decades of research support the notion that children can competently acquire two 
or more languages (Garcia, 2005). Relationships of linguistic properties between 
languages are complex, and several theories have been presented over the years to 
explain how language develops for young bilingual children. Among the major 
theoretical approaches, available empirical evidence suggests that transfer theory best 
explains the language development of young children managing two or more languages 
(Genesee, Geva, Dressier, & Kamil, 2006). This theoretical position asserts that certain 
linguistic skills from the native language transfer to the second. In like manner, errors or 
interference in second language production occurs when grammatical differences 
between the two languages are present. Language that is contextually-embedded and 
cognitively undemanding — or automatic, over-learned communication — does not lend 
itself well to transfer. Contextually-reduced and cognitively demanding language skills, 
on the other hand, tend to transfer more easily between languages. Higher order 
cognitive skills relevant to academic content are more developmentally interdependent 
and, therefore, amenable to transfer (Genesee, Geva, Dressier, & Kamil, 2006). In the 
process of cross-linguistic transfer, it is normal for children to mix (or “code-switch”) 
between languages. Mixing vocabulary, syntax, phonology, morphology, and pragmatic 
rules serves as a way for young bilingual children to enhance meaning. Because 
language use is context-driven, the bilingual child’s choice of language depends on 
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characteristics of and the partieular relationship with the addressee(s) as well as the 
eultural identity and attitudinal features of the child, and overall comfort with the 
language. 

Assessment Issues 

Although many young ELLs have immigrant parents or earegivers, the vast majority of 
these students are native born US citizens and have been legally granted the same rights 
to education as their native English-speaking peers. Benefiting from valid educational 
assessment is ineluded in these rights. While the eurrent knowledge base and legal and 
ethieal standards governing ELL assessment are limited, they are suffieient to provide 
guidanee for the development of appropriate and valid assessment. Making 
improvements on existing assessments will require eommitments from polieymakers and 
practitioners to develop and implement appropriate assessment tools and proeedures, to 
link assessment results to improved praetiees, and to utilize trained staff eapable of 
earrying out these tasks. Researehers and seholars ean faeilitate the improvement of 
assessment praetiees by eontinuing to evaluate implementation strategies in sehools, and 
by developing systematie assessments of eontextual factors relevant to linguistie and 
eognitive development. Assessments of eontextual processes will be neeessary if eurrent 
assessment strategies, whieh largely focus on the individual, are to improve elassroom 
instruction, curricular content, and, therefore, student learning (Rueda, 2007; Rueda & 
Yaden, 2006). 

Reasons to assess 

Several skills and developmental abilities of young ehildren are assessed in early 
edueation programs, ineluding presehool and the first few elementary sehool years. 
Sensing an inerease in demand for greater aceountability and enhaneed edueational 
performanee of young ehildren, the National Edueation Goals Panel developed a list of 
prineiples to guide early educators through appropriate and seientifieally-sound 
assessment practices (Shepard, Kagan, & Wurtz, 1998). Moreover, the panel presented 
four purposes for assessing young ehildren. Pertinent as well to the assessment of young 
ELL children, the purposes were a) to promote ehildren’ s learning and development, b) 
to identify ehildren for health and special services, e) to monitor trends and evaluate 
programs and serviees, and d) to assess aeademie aehievement to hold individual 
students, teaehers, and sehools aeeountable (i.e., high stakes testing) (Committee on the 
Foundations of Assessment, 2001; National Researeh Couneil, 2008; Shepard, Kagan, & 
Wurtz, 1998). Embedded within eaeh of these purposes are important considerations for 
practice so as to preserve assessment aeeuraey and support interpretations of results that 
lead to inereased edueational opportunity for the student. 

Legal and ethical precedent 
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The impetus for appropriate and responsive assessment praetiees of young ELLs 
is supported by a number of legal requirements and ethieal guidelines, which have 
developed over time. Case law, public law, and ethical codes from professional 
organizations support the use of sound assessment tools, practices, and test 
interpretations. A widely cited set of testing standards are found in a recent publication 
from the American Psychological Association (APA), the American Educational 
Research Association (AERA), and the National Council on Measurement in Education 
(NCME) entitled Standards for Educational and Psychological Testing (1999). Revised 
from the 1985 version, in its fourth edition, this volume offers a number of ethical 
standards for assessing the psychological and educational development of children in 
schools, including guidelines on test development and application. Included is a chapter 
on testing children from diverse linguistic backgrounds, which discusses the irrelevance 
of many psychoeducational tests developed for and normed with monolingual, English- 
speaking children. Caution is given to parties involved in translating such tests without 
evaluating construct and content validity and developing norms with new and relevant 
samples. It also discusses accommodation recommendations, linguistic and cultural 
factors important in testing, and important attributes of the tester. Similar, though less 
detailed provisions exist in the Professional Conduct Manual published by the National 
Association of School Psychologists (2000). 

It has been argued that the standards presented by APA, AERA and NCME have 
outpaced present policy, practice, and test development (Eigueroa & Hernandez, 2000). 
However, the federal Individuals with Disabilities Education Act (IDEA 2004) does 
provide particular requirements related to the assessment of ELEs. It requires, for 
example, the involvement of parents/guardians in the assessment process as well as a 
consideration of the child’s native language in assessment. Unlike ethical guidelines, 
which often represent professional aspirations and are not necessarily enforceable, public 
law requires compliance. The Office for Civil Right (OCR) is given the charge to 
evaluate compliance to federal law and, where necessary, audit public programs engaged 
in assessment practices and interpretations of test performance by EEEs and other 
minority children. 

Assessment practice: use and misuse 

Several domains of development are assessed during the early childhood years. 
These include cognitive (or intellectual), linguistic, socioemotional (or behavioral), 
motor, and adaptive (or daily living) skills, as well as hearing, vision, and health factors. 
Educational settings are primarily concerned, however, with the cognitive, academic, and 
linguistic development of children. Other domains are of interest insofar as they impact 
students’ educational well-being, as stated in IDEA (2004). This section focuses 
primarily on these areas not because others are irrelevant, but because they are given the 
most emphasis and importance in schools. Developing appropriate assessment measures 
and practices, however, transcends developmental domains and is considered important 
for the assessment of culturally and linguistically diverse children in general. 
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In addition to the concerns that attend the assessment of all children, there are 
central issues inherent in the assessment of young children from non-English language 
backgrounds. Implementation research suggests that assessment practices with young 
ELLs continue to lag behind established legal requirements and ethical standards set 
forth by APA, AERA and NCME. In part, this is because of a lack of available 
instruments normed on representative samples of English language learners, because of 
inadequate professional development and training, and partly because of insufficient 
research to inform best practice. Such is the case for the assessment of language, 
cognitive skills, academic achievement, and other areas. Each of these areas is visited 
briefly. 

English Proficiency Assessment in Arizona 

Language is the key distinguishing feature for ELLs. Therefore, assessments of language 
in early and elementary school settings are used to determine oral English proficiency, to 
determine first- and second-language vocabulary skills, to predict literacy performance , 
and ultimately to identify and place students into programs (including special education) 
(Garcia, McKoon, & August, 2006). Prior to the 2004-2005 school year, Arizona’s 
procedures for reclassifying English language learners (ELLs) to non-ELL (or FEP) 
status were based on multiple measures related to student language proficiency and 
academic achievement. In that year, however, Arizona adopted the Stanford English 
Language Proficiency Test (SELP^), a measure that provides an indication of language 
proficiency but not academic attainment, and began using it as the sole criterion to 
reclassify ELLs to English proficient status. 

Mahoney, Haladyna and MacSwan (2009) investigated the appropriateness of 
relying on a single measure to reclassify English learners (EEs) to non-EL (or FEP) 
status in Arizona. According to these researchers, Arizona’s change in reclassification 
procedures was problematic, in that it disregarded the view shared by AERA, APA, and 
NCME and others that that relying on a single testing measure for high stakes 
educational decisions is inappropriate. Moreover, citing observations of several state 
teachers and administrators, the researchers suggest Arizona’s new single test re- 
classification procedure removes many ELLs from language services before they are 
ready to succeed in mainstream classes. 

Therefore, with the objective of examining the effectiveness of the SEEP as a 
reclassification tool, Mahoney and colleagues proposed two research questions. The first 
one sought to answer whether SELP-reclassified EEs develop the necessary English 
language skills to be successful in the English language curriculum. The second question 
focused on how the SEEP differed from tests that had been previously used for 
reclassification purposes in the state. 
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In order to answer their first question, Mahoney and colleagues analyzed the 
performance of SELP-reclassified students from a Phoenix metropolitan area school 
district in grades 3-8 in the Arizona’s Instrument to Measure Standards test (AIMS) in 
2005. Results from this group were statistically compared to those of a control group, 
which consisted of students who had been reclassified in 2004 through multiple 
measures (in this case, the use of the Woodcock-Munoz test together with the Reading 
Comprehension subtest of the Stanford Achievement Test - SAT-9). Results showed that 
the control group outperformed the SELP-reclassified students on all parts of the AIMS, 
and the percentage of reclassified students was higher in 2005 than in 2004. The research 
team interpreted these results as evidence of premature reclassification by the SEEP, a 
fact that could jeopardize reclassified EEs’ performance in mainstream classrooms. 

In order to answer their second question (how the SEEP differs from tests that 
were previously used for reclassification purposes in the state), Mahoney and colleagues 
compared the consistency of pass/fail decisions and the passing rates of the SEEP to 
those of the Language Assessment Scales test (LAS), which was one of the tests that had 
been used in previous years in Arizona. Both tests were administered to a group of 288 
students from one Phoenix metropolitan area school within a short period of time. 
Results showed that 17% of the students were not classified consistently by the two tests, 
and the SEEP passing rate was statistically higher than that of the LAS. 

Mahoney, Haladyna and MacSwan’s conclusion was that that SEEP is probably 
over-reclassifying EEs into FEP status, as teachers and administrators had already 
perceived. They emphasize the need for reclassification tools that rely on multiple 
measures, as recommended by the measurement community, and suggest that states must 
adopt procedures that have a language component as well as an academic achievement 
indicator for reclassification decisions. The present study extends the work by Mahoney, 
et. ah, (2009), with the goal of evaluating the strength of the relationship between 
AZELLA (the test that replaced the SEEP) and AIMS subtest scores for students 
identified as English Language Learners in Arizona. 
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Methods 

This study examined the relationship between performance on the AZELLA and 
performance on the state’s NCLB-mandated standardized achievement test -the AIMS, 
in a sample of ELL students. The study sought to answer whether the relationship 
between the two tests is consistent across grade levels. 

Participants 

All participants attended elementary or middle schools within a mid-size, urban 
school district in the Southwest United States. During the 2008-2009 school year, this 
district served approximately 8,500 students who were enrolled in 17 schools. Archival 
performance data for participants were provided by the school district and there was no 
direct contact between the researchers and the participants in this study. 

Only children who were administered and received valid scores on the Arizona 
Instrument to Measure Standards (AIMS) and the Arizona English Language Learner 
Assessment (AZELLA) during the 2008-2009 school year were included in the study. 
Lor each participant, the anonymous identification number assigned by the district was 
used to match student performance on the two tests. If participants were administered the 
AZELLA more than once, only scores for the first administration were included. 
Additional sampling criteria were: (a) enrollment in the third, fifth, or eighth grades and 
(b) English Language Learner classification. These criteria yielded a sample of 710 
students (see Table 1). Of the participants, 378 (53%) of the participants were male and 
332 (47%) were female. Ninety-three percent of the participants were identified as 
Hispanic and Spanish was the primary language spoken (84%). 



Table 1 . Sample size by grade level 


Grade Level 


n 


3 


349 


5 


187 


8 


174 



Instruments 

Arizona Instrument to Measure Standards (AIMS). AIMS is a standardized 
achievement measure designed to assess student performance in three academic 
categories: mathematics, reading, and writing (ADE: Azella Technical Manual). 
Reliability of the 2009 AIMS reading and math subtests was estimated with Cronbach’s 
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measure of internal eonsistency. For English-language learners in the grades targeted in 
this study, Alpha coeflieients ranged from .82 to .91. Internal eonsistency was generally 
higher for mathematics than reading, and higher for lower grades than higher grades. 

AIMS tests contain embedded items from the Terranova making it possible to 
derive both criterion-referenced (AIMS CRT) and norm-referenced (AIMS NRT) scale 
scores. As the AIMS CRT and NRT do not contain the same items, inter-correlations 
between the two forms were provided as evidence for construct validity. The developers 
report high correlations between the two forms when assessing the same construct and 
lower correlations among dissimilar constructs (e.g., Reading and Mathematics). For the 
current study, AIMS CRT scores on the reading and mathematics subtests were analyzed. 

Arizona English Language Learner Assessment (AZELLA). AZELLA is a 
criterion-referenced test used by the state of Arizona to assess English proficiency for the 
purposes of determining whether students receive EEE services. Developed alongside 
Arizona’s K-12 English Eanguage Proficiency standards, AZEEEA was intended to 
augment the Stanford English Eanguage Proficiency (SEEP) test. The technical manual 
estimates alignment to state standards to be 85%. Depending on grade level, several 
forms of AZEEEA are administered. The Elementary form is used for students in grades 
3, 4, and 5. The Middle Grades form is administered to students in grades 6, 7, and 8. 
Both tests contain similar item types (i.e., multiple-choice; extended response) and yield 
scores on four subtests: Speaking, Eistening, Reading, and Writing. Subtest scores are 
combined to form a Total Composite score. Evidence for the reliability of AZEEEA is 
provided with Cronbach’s Alpha. Coefficients for targeted grades were high, ranging 
from .90 to .97. Inter-correlations among subtest scores were rational, providing 
evidence for criterion-related validity. Composite scores on the AZEEEA Elementary 
and Middle Grades forms will be used in this study. 

Data Analysis 

Attenuation-corrected Pearson correlation coefficients were calculated to 
investigate the relationship among AIMS subtest scores and AZEEEA composite scores. 
Scatter plots were examined for outliers and to rule out non-linear relationships among 
the dependent variables (Green & Salkind, 2005). Alpha was set at .05 and Bonferroni 
methods were used to correct error rate across the multiple correlations. However, due to 
the large sample sizes and the nature of the variables being compared, it was assumed ah 
correlations would be statistically significant. The square of r was calculated and served 
as the measure of effect size. To evaluate the consistency of the relationship among 
AIMS and AZEEEA performance across grade levels, a Eischer’s z transformation was 
performed and 95% confidence limits were obtained using methods described by Zou 
(2007) Due to the requirement that participants must be administered both AIMS and 
AZEEEA, very few cases were missing values on one or more variables. Two cases were 
missing scores on the AIMS Reading subtest and were deleted from the eighth grade 
analysis listwise. 
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Results 

The results of the correlation analysis are provided in Table 2. As expected, all 
correlations were significant after controlling for Type 1 error at .05. For reading, the 
strongest association between AZELLA and AIMS Reading occurred in the sample of 
third graders r(347) = .71, p < .001, with 50% of the variance in AIMS Reading 
accounted for by its linear relationship to AZELLA performance. The correlation for 
fifth graders was also large, but less so than for third graders. The magnitude of the 
correlation for the eighth grade sample was moderate, with AZELLA performance only 
accounting for 1 1% of the variance in AIMS Reading. 

Similar trends were found in the correlations between AZELLA and AIMS math 
although the correlations for each grade were slightly lower than those for reading. As 
with reading, the correlation for third graders was high, r(347)= .61, p < .001, r^= .37, 
and larger in magnitude than those associated with fifth graders or eighth graders. 
Overall, the results of the correlation analysis suggest that students who perform well on 
AZELLA also tend to perform well on AIMS, although this relationship is slightly 
stronger for reading than math, and much stronger for third graders. 

To further investigate the relationship between grade level and the association 
between AZELLA and AIMS, 95% confidence intervals were calculated for between- 
grade differences for both content areas using Fisher’s z transformation. As shown in 
Table 3, the hypothesis that the strength of association between AZELLA performance 
and AIMS Reading performance decreases as grade level increases is supported. With 
95% confidence, the correlation between AZELLA with AIMS Reading is .08 to .41 
larger for third graders than fifth graders. The confidence intervals for the remaining 
grade comparisons for third graders suggest that the strength of association is .13 to .46 
larger for third graders than eight graders and .02 to .41 larger for fifth graders than 
eighth graders. Interpretations of these last two comparisons (third vs. eighth; fifth vs. 
eighth) need to consider that the two groups were administered similar, but different 
forms of AZELLA. Despite this, it does seem that overall, the correlation between 
AZELLA and AIMS math performance decreases significantly as grade level increases. 
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Table 2, Correlations among AZELLA composite scores and AIMS subtest by grade 





AIMS M SS 




AIMS R SS 


Pearson’s r 


.61 


Third Grade 


.71 


i?-squared 


.37 




.50 


N 


349 




349 


Pearson’s r 


.49 


Fifth Grade 


.53 


i?-squared 


.24 




.28 


N 


187 




187 


Pearson’s r 


.30 


Eiahth Grade 


.33 


i?-squared 


.09 




.11 


N 


172 




174 



The results do not support our hypothesis as strongly in math. The 95% intervals 
for the differenee between 3rd and 5th graders and the difference between 5th and eighth 
graders contained zero, indicating that we do not have sufficient evidence to conclude 
statistically significant differences between these groups. However, while statistical 
significance cannot be concluded, the lower bounds are very close to zero, indicating that 
despite the lack of statistical significance, there may be considerable differences in the 
strength of association among grades, with a decreasing trend as grade level increases. 
Although these results are suggestive, one limitation of the current study was an inability 
to control for immigration status and length of time in ELD services. Including these 
variables in future research would help to clarify the differences in predictive ability 
described in this study. 
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Table 3. 95% Cl for Differences in Pearson’s r 





Mathematics 


Reading 




Lower 


Upper 


Lower 


Upper 


3rd and 5th 


-.03 


.32 


.08 


.41 


3rd and 8th 


.10 


.44 


.13 


.46 


5th and 8th 


.00 


.39 


.02 


.41 



Conclusions 

Findings and Principles in the Assessment of ELLs 

This study finds that at higher grade levels, the “one-test” AZELLA becomes less 
predictive of academic achievement. In doing so, the use of the AZELLA over predicts 
the transitioned student’s capacity to succeed academically in the regular classroom and 
places a critical barrier to obtaining an equal education in Arizona. This finding calls 
into question Arizona’s “one-test” procedure for identifying ELL students and their 
transition to a non-service category, particularly transitioning into the English-only 
educational curriculum. Hence, the gap between current practice in the assessment of 
English language learners in Arizona and the standards set forth through research, policy, 
and ethics is largely a function of the gap between practical and optimal realities. Due to 
the many demands and constraints placed on teachers and schools from local, state, and 
federal governments, including budgeting responsibilities and the many programs 
implemented each school year, it can be extremely challenging to keep pace with best 
practices and ethical standards. However, given the large and increasing size of the 
young ELL child population in Arizona, the current focus on testing and accountability, 
and the documented problems in current assessment practices, improvements are critical. 
These improvements are necessary at all phases of the assessment process, including pre- 
assessment and assessment planning, conducting the assessment, analyzing and 
interpreting the results, reporting the results (in written and oral format), and determining 
eligibility and monitoring. 

Researchers and organizational bodies have offered principles for practitioners 
engaged in the assessment of ELLs. Among the most comprehensive comes a list from 
the National Association for the Education of Young Children (NAEYC; Clifford et ah, 
2005). Included as a supplement to the NAEYC position statement on early childhood 
curriculum, assessment and program evaluation, Clifford et al. present detailed 
recommendations “to increase the probability that all English language learners will have 
the benefit of appropriate, effective assessment of their learning and development” (p.l). 
The last of these recommendations concerns further needs (i.e., research and practice) in 
the field. Because these recommendations — presented here as principles — materialized 
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as a collaborative effort from a committee comprised of over a dozen researehers in the 
field, they are also representative of recommendations found in the literature. 

First, assessment instruments and procedures should be used for appropriate 
purposes. Assessments should be used fundamentally to support learning, including 
language and academie learning. For evaluation and accountability purposes, ELLs 
should be included in assessments and provided with appropriate tests and 
aecommodations. 

Second, assessments should be linguistically and culturally appropriate. This 
means assessment tools and procedures should be aligned with cultural and linguistic 
characteristics of the child. Tests should be culturally and linguistically validated to 
verify the relevanee of the content (i.e., content validity) and the construet purported to 
be measured (i.e., construct validity). Moreover, in the case of normed-based tests, the 
characteristics of children included in the normative sample should reflect the linguistic, 
ethnic, and socioeconomic characteristics of the child. 

Third, the primary purpose of assessment should be to improve instruction. The 
assessment of student outcomes using appropriate tools and procedures should be linked 
closely to classroom processes. This means relying on multiple methods and measures, 
evaluating outcomes over time, and using collaborative assessment teams, including the 
teaeher, who is a critical agent for improved learning and development. Assessment that 
systematically informs improved currieulum and instruction is the most useful. 

Fourth, caution ought to be used when developing and interpreting standardized 
formal assessments. Standardized assessments are used for at least three purposes — to 
determine program eligibility, to monitor and improve learning, and for aceountability 
purposes. It is important ELLs are included in large-scale assessments, and that these 
instruments continue to be used to improve educational practices and placements. 
However, those administering and interpreting these tests ought to use caution. Test 
development issues must be serutinized, and evidence-based aceommodations ought to 
be provided during aceountability assessments. 

Einally, families should play critical roles in the assessment process. Under 
federal law, parents have the right to be included in the deeision making process 
regarding the educational placement for their ehild. Moreover, the educational benefit of 
the assessment process for a given child is optimal when parents’ wishes are voieed and 
considered throughout. Although family members should not administer formal 
assessments, they are encouraged to be involved in the selection of assessments and the 
interpretation of results. The process and results of assessment should be explained to 
parents in a way that is meaningful and easily understandable. 
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Future Directions for Practice in Arizona 

As mentioned, there is a gap between eurrent assessment praetiee of young ELLs 
and what the researeh and the legal and ethieal standards suggest is best praetiee. It is 
important, therefore, that researeh and praetiee eontinue an ongoing dialogue to improve 
this seenario. There are three ways in which researchers and scholars will be able to 
engage assessment scholarship to this end. Support and necessary funding should be 
provided by policy makers, institutions of higher education, and other research programs 
to pursue this course. 

First, the field needs more tests developed and normed especially for English 
language learners. This will require a bottom-up approach, meaning assessment tools, 
procedures, and factor analytic structures are aligned with cultural and linguistic 
characteristics of FEE children, as opposed to top-down approaches where, for example, 
test items are simply translated from their original language to the native languages of 
young EELs. Normed-based tests should also take into account important characteristics 
of the child, including their linguistic, ethnic, and socioeconomic histories. 

Second, it is time conceptual and empirical work on student assessment move 
beyond the student level. That is, the majority of the present discussion reflects the 
extant literature which has focused heavily on the assessment of processes and outcomes 
within the student — assessing language and academic learning. With this knowledge- 
base teachers and schools are expected to adjust aspects of the environment to improve 
learning. It has become clear that processes outside the student — including within the 
classroom (e.g., teacher-student interactions, peer to peer interactions), the home (e.g., 
frequency of words spoken, amount of books), and within the school (e.g., language 
instruction policies) — affect learning, the field presently lacks conceptual frameworks 
and the measures necessary to move this research forward to systematically improve 
student learning. Preliminary research on the role of context in learning suggests that 
variations environmental factors can increase student engagement and participation 
(Christenson, 2004; Goldenberg, Rueda, & August, 2006), which, in turn can lead to 
increased learning — and that the influence of contextual contingencies on learning 
outcomes is mediated by children’s motivation to learn (Rueda, 2007; Rueda, 
MacGillivray, Monzo & Arzubiaga, 2001; Rueda & Yaden, 2006). Conceptual 
frameworks should account for the multilevel nature of contexts, including the nesting of 
individuals within classrooms and families, classrooms within schools, and schools 
within school districts, communities, and institutions. Moreover, the role of culture and 
the feasibility of cultural congruence across within- and out-of-school contexts will be 
important to this work. Meaningful empirical work in this area will require the 
convergence of research methods (e.g., multi-level statistics and the mixing of qualitative 
approaches with quasi-experimental designs) and social science disciplines (e.g., 
cognitive psychology, educational anthropology, sociology of education). 
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Finally, as the population of young ELLs continues to grow, more serious 
psychometric work is needed so as to better serve these students in ways in which they 
will profit from the “right” to be assessed reliably and validly so they might be served 
effectively. Arizona is presently failing its ELL students in this regard. 
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